[SPARK-34615][SQL] Support java.time.Period as an external type of the year-month interval type#31765
[SPARK-34615][SQL] Support java.time.Period as an external type of the year-month interval type#31765MaxGekk wants to merge 1 commit intoapache:masterfrom
java.time.Period as an external type of the year-month interval type#31765Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
java.time.Period as an external type of the year-month interval typejava.time.Period as an external type of the year-month interval type
|
@cloud-fan @yaooqinn @srielau @HyukjinKwon Could you review this PR, please |
| * @return The total number of months in the period, may be negative | ||
| * @throws ArithmeticException If numeric overflow occurs | ||
| */ | ||
| def periodToMonths(period: Period): Int = { |
There was a problem hiding this comment.
Shall we fail if the day field is not 0? Or at least give a warning?
There was a problem hiding this comment.
We don't fail when we convert:
java.sql.Datehas time component with millisecond precision but we ignore it when we convert to days atjava.sql.Timestampwhich has nanoseconds precision:java.time.Instantwhich contains nanoseconds, and we don't fail when we convert it to microseconds:
To be consistent with current implementation for other types, I do believe we should not fail.
Or at least give a warning?
This will just fill in the logs by useless records, and this is again inconsistent with current implementation.
|
thanks, merging to master! |
|
Late LGTM! |
What changes were proposed in this pull request?
In the PR, I propose to extend Spark SQL API to accept
java.time.Periodas an external type of recently added new Catalyst type -YearMonthIntervalType(see #31614). The Java classjava.time.Periodhas similar semantic to ANSI SQL year-month interval type, and it is the most suitable to be an external type forYearMonthIntervalType. In more details:PeriodConverterwhich convertsjava.time.Periodinstances to/from internal representation of the Catalyst typeYearMonthIntervalType(toInttype). ThePeriodConverterobject uses new methods ofIntervalUtils:periodToMonths()converts the input period to the total length in months. If this period is too large to fitInt, the method throws the exceptionArithmeticException. Note: the input period has "days" precision, the method just ignores the days unit.monthToPeriod()obtains ajava.time.Periodrepresenting a number of months.YearMonthIntervalTypeinRowEncodervia the methodscreateDeserializerForPeriod()andcreateSerializerForJavaPeriod().java.time.Periodinstances.Why are the changes needed?
java.time.Periodcollections, and construct year-month interval columns. Also to collect such columns back to the driver side.Does this PR introduce any user-facing change?
The PR extends existing functionality. So, users can parallelize instances of the
java.time.Durationclass and collect them back:How was this patch tested?
CatalystTypeConvertersSuiteto check conversion from/tojava.time.Period.RowEncoderSuite.YearMonthIntervalTypeare tested inLiteralExpressionSuite.DatasetSuiteandJavaDatasetSuite.IntervalUtilsSuitesto check conversionsjava.time.Period<-> months.